Practical Bayesian Modeling with Stan and brms

Author

Masatoshi Katabuchi

Published

October 31, 2025

1 Docker (optional)

Run this on your terminal to start an RStudio Server with all the necessary packages installed.

docker run -d -p 8787:8787 \
  -e PASSWORD=123456  \
  mattocci/afec-bayes:prod

Access 127.0.0.1:8787 and log in with the username rstudio and the password 123456 (or your own password).

Note: Because you’re running this container on your own machine and mapping only to 127.0.0.1:8787, the RStudio Server isn’t accessible from outside your computer. So, using a simple password like 123456 is fine and safe for local development.

2 Setup

library(brms)
library(cmdstanr)
library(tidyverse)
library(bayesplot)
library(patchwork)
# cmdstanr::set_cmdstan_path("/opt/cmdstan/cmdstan-2.37.0")

my_theme <- theme_bw() + theme(
    axis.text = element_text(size = 14),
    strip.text = element_text(size = 14)
  )

theme_set(my_theme)
library(tictoc)

3 Normal model

Leaf mass per area (LMA) is an important trait that reflects plant strategies. LMA for interspecific data often shows a log-normal distribution.

\[ \text{log}(y_i) \sim N(\mu, \sigma) \]

Q: What are the mean (\(\mu\)) and standard deviation (\(\sigma\)) of log LMA?

3.1 Dummy data

set.seed(123)
n <- 100
mu <- 4.6
sigma <- 0.7
y <- rnorm(n, mean = mu, sd = sigma)

hist(y)

cmdstan uses list format for data input.

normal_list <- list(
  y = y,
  N = length(y)
)

brms uses dataframe (tibble) format for data input like lme4::lmer and stats::glm.

normal_df <- tibble(
  y = y,
  N = length(y)
)

3.2 CmdStan

We need to write and save Stan programs (.stan files).

stan/normal.stan

 // observed data
data {
  int<lower=0> N;
  vector[N] y;
}

// parameters to be estimated
parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  // likelihood
  y ~ normal(mu, sigma);
  // priors
  mu ~ normal(0, 10);
  sigma ~ cauchy(0, 2.5);
} 

stan/normal_vague.stan

 data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  y ~ normal(mu, sigma);
  mu ~ normal(0, 1e+4); // Vague prior for mu
  sigma ~ normal(0, 1e+4); // Vague prior for sigma
} 

These commands compile the models, translating Stan code to C++ and then into machine-executable files. This takes a while (about 20 secs to 1 min) the first time only.

normal_mod <- cmdstan_model("stan/normal.stan")
normal_vague_mod <- cmdstan_model("stan/normal_vague.stan")
# Persist CSV outputs to avoid /tmp cleanup during render
normal_out_dir <- file.path("assets", "cmdstan", "normal")
if (!dir.exists(normal_out_dir)) dir.create(normal_out_dir, recursive = TRUE)

normal_fit <- normal_mod$sample(
  data = normal_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000,  # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 500,        # print update every 500 iters
  output_dir = normal_out_dir
)
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/tmp/RtmpS3fhTi/model-497d52bc13b08.stan', line 15, column 2 to column 24)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.2 seconds.

We don’t have to recompile the model when we want to change the number of iterations, chains, or data.

normal_vague_fit <- normal_vague_mod$sample(
  data = normal_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 500 # print update every 500 iters
)

3.2.1 Summary

We use this output when writing manuscripts and for model diagnostics.

normal_fit$summary()
# A tibble: 3 × 10
  variable   mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -6.34  -6.03  1.00   0.721  -8.30  -5.39  1.00     1927.    2334.
2 mu        4.66   4.66  0.0659 0.0651  4.55   4.77  1.000    3370.    2293.
3 sigma     0.646  0.643 0.0456 0.0455  0.575  0.725 1.00     3819.    2828.
  • lp__: log posterior.
    • \(\text{log}\; p(\theta \mid \text{data}) \propto \text{log}\; p(\text{data} \mid \theta) + \text{log}\; p(\theta)\)
  • rhat (Gelman-Rubin statistic): should be close to 1.
    • Measures the convergence of the chains.
    • rhat > 1.1: Definitely bad.
    • 1.01 < rhat < 1.1: Suspicious.
    • rhat <= 1.01: Good.
  • ess_bulk and ess_tail: Effective sample size for bulk and tail of the posterior distribution.
    • ess_bulk: sampling efficiency for the bulk posteriors (e.g., mean, SD).
    • ess_tail: sampling efficiency for the tails (e.g., quantiles like 2.5%, 97.5%).
    • ess_bulk, ess_tail > 400: Good.

3.2.2 Draws (posterior samples)

For each parameter, we have 1000 iterations \(\times\) 4 chains = 4000 posteriors. We use this for visualization and diagnostics.

normal_fit$draws()
# A draws_array: 1000 iterations, 4 chains, and 3 variables
, , variable = lp__

         chain
iteration    1    2     3    4
        1 -5.4 -5.5 -10.3 -5.5
        2 -5.5 -6.8  -5.4 -5.7
        3 -5.5 -6.7  -6.7 -6.1
        4 -6.3 -6.1  -5.8 -6.8
        5 -6.1 -5.5  -5.7 -6.0

, , variable = mu

         chain
iteration   1   2   3   4
        1 4.6 4.6 4.5 4.7
        2 4.6 4.8 4.7 4.7
        3 4.6 4.8 4.6 4.6
        4 4.7 4.7 4.7 4.7
        5 4.7 4.6 4.7 4.6

, , variable = sigma

         chain
iteration    1    2    3    4
        1 0.65 0.66 0.63 0.63
        2 0.64 0.67 0.62 0.66
        3 0.63 0.64 0.70 0.60
        4 0.68 0.67 0.68 0.57
        5 0.67 0.65 0.60 0.59

# ... with 995 more iterations

Trace plots for diagnosing convergence and mixing of the chains.

normal_draws <- as_draws_df(normal_fit$draws())

color_scheme_set("viridis")
mcmc_trace(normal_draws, pars = c("mu", "sigma"))

Histograms of the posterior distributions of the parameters.

mcmc_hist(normal_draws, pars = c("mu", "sigma"))
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

3.2.3 Diagnostic summary

normal_fit$diagnostic_summary()
$num_divergent
[1] 0 0 0 0

$num_max_treedepth
[1] 0 0 0 0

$ebfmi
[1] 1.005079 1.074466 1.096794 1.190007
  • num_divergent: indicates the number of iterations (sampling transitions) where the Hamiltonian trajectory failed (divergent transition).
    • Even 1-2 divergent transitions suggest that model may not be reliable, especially in hierarchical models.
    • We need reparameterization or increase adapt_delta to make the sampler take smaller steps.
  • num_max_treedepth: indicates the number of iterations where there are not enough leapfrog steps.
    • This can indicate that the model is complex or that the priors are too tight.
    • We can increase max_treedepth to allow more steps, but this can increase the computation time.
  • ebfmi: Energy-Bayesian Fraction of Missing Information
    • Measures whether the resampled momentum generates enough variation in energy to explore the posterior efficiently.
    • Low ebfmi means the sampler may be stuck in tight regions and exploring poorly, even if Rhat and ESS look fine.
    • Guidelines:
      • ebfmi < 0.3: Bad
      • 0.3 < ebfmi <= 1: Acceptable
      • ebfmi >= 1: Good

3.2.4 Methods text example

“We estimated posterior distributions of all parameters using the Hamiltonian Monte Carlo (HMC) algorithm implemented in Stan (Carpenter et al., 2017) with weakly informative priors (Gelman et al., 2008). The HMC algorithm uses gradient information to propose new states in the Markov chain, leading to a more efficient exploration of the target distribution than traditional Markov chain Monte Carlo (MCMC) methods that rely on random proposals (Carpenter et al., 2017). This efficiency allows us to achieve convergence with fewer iterations than traditional MCMC methods. Four independent chains were run for 2000 iterations for each model with a warm-up of 1000 iterations. Convergence of the posterior distribution was assessed using the Gelman–Rubin statistic with a convergence threshold of 1.1 (Gelman et al., 2013), ensuring effective sample sizes greater than 400 (Vehtari et al., 2021), and by monitoring divergent transitions (Betancourt, 2016) for all parameters.”

3.3 brms

brms uses an lme4-like syntax.

normal_fit_brm0 <- brm(y ~ 1, family = gaussian(), data = normal_df)

We can also specify priors for the parameters. The prior argument can be a bit tricky, since it is not always obvious which parameter names correspond to which parts of the model, especially when the model becomes more complex.

We can also control the number of warmup iterations, total iterations, and other sampling settings.

The cmdstanr backend is generally faster than the default rstan backend.

normal_fit_brm <- brm(
  y ~ 1,
  family = gaussian(),
  data = normal_df,
  prior = c(
    prior(normal(0, 5), class = "Intercept"),
    prior(cauchy(0, 2.5), class = "sigma")
  ),
  seed = 123,
  iter = 2000, # total iterations (warmup + post-warmup)
  warmup = 1000,
  chains = 4, cores = 4,
  backend = "cmdstanr"
)
Start sampling
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.1 seconds.
Loading required namespace: rstan

3.3.1 Summary

summary(normal_fit_brm)
 Family: gaussian 
  Links: mu = identity 
Formula: y ~ 1 
   Data: normal_df (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     4.66      0.06     4.53     4.79 1.00     3017     2207

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.65      0.05     0.56     0.75 1.00     3322     2774

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

3.3.2 Draws (posterior samples)

normal_draws_brm <- posterior::as_draws_df(normal_fit_brm)
normal_draws_brm
# A draws_df: 1000 iterations, 4 chains, and 5 variables
   b_Intercept sigma Intercept lprior lp__
1          4.8  0.63       4.8   -4.4 -102
2          4.5  0.73       4.5   -4.4 -104
3          4.5  0.76       4.5   -4.4 -105
4          4.8  0.70       4.8   -4.4 -103
5          4.8  0.61       4.8   -4.4 -104
6          4.7  0.66       4.7   -4.4 -102
7          4.6  0.63       4.6   -4.4 -102
8          4.6  0.63       4.6   -4.4 -102
9          4.7  0.69       4.7   -4.4 -103
10         4.6  0.60       4.6   -4.4 -102
# ... with 3990 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
mcmc_trace(normal_draws_brm, pars = c("b_Intercept", "sigma"))

mcmc_hist(normal_draws_brm, pars = c("b_Intercept", "sigma"))
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

3.3.3 Diagnostic summary

rstan::check_hmc_diagnostics(normal_fit_brm$fit)

Divergences:
0 of 4000 iterations ended with a divergence.

Tree depth:
0 of 4000 iterations saturated the maximum tree depth of 10.

Energy:
E-BFMI indicated no pathological behavior.

4 Poisson model

Count data (e.g., species richness, seed counts, pollinator visits…) often follow a Poisson distribution.

\[ y_i \sim \text{Poisson}(\lambda) \]

\[ y_i \sim \operatorname{Poisson\_log}(\log \lambda) \]

Q: What are the mean and variance (\(\lambda\)) of species richness per plot?

4.1 Dummy data

set.seed(123)
y <- rpois(n, lambda = 12.3)

pois_list <- list(
  y = y,
  N = n
)

pois_df <- tibble(
  y = y,
  N = n
)

4.2 CmdStan

stan/pois.stan

 data {
  int<lower=0> N;
  array[N] int<lower=0> y;
}

parameters {
  real<lower=0> lambda;
}

model {
  y ~ poisson(lambda);
  lambda ~ normal(0, 10);
} 

stan/pois_re.stan

 data {
  int<lower=0> N;
  array[N] int<lower=0> y;
}

parameters {
  real log_lambda;
}

model {
  y ~ poisson_log(log_lambda);
  log_lambda ~ normal(0, 2.5);
} 
pois_mod <- cmdstan_model("stan/pois.stan")
pois_re_mod <- cmdstan_model("stan/pois_re.stan")
pois_fit <- pois_mod$sample(
  data = pois_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # print update every 500 iters
)
Running MCMC with 4 parallel chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.1 seconds.
pois_re_fit <- pois_re_mod$sample(
  data = pois_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # print update every 500 iters
)
Running MCMC with 4 parallel chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.1 seconds.
pois_fit$summary()
# A tibble: 2 × 10
  variable   mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lp__     1836.  1836.  0.710 0.305 1834.  1836.   1.00    1898.    2556.
2 lambda     12.2   12.2 0.349 0.344   11.6   12.8  1.00    1370.    1871.
pois_re_fit$summary()
# A tibble: 2 × 10
  variable      mean  median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>        <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lp__       1833.   1834.   0.714  0.324  1.83e3 1.83e3  1.00    2170.    2574.
2 log_lambda    2.50    2.50 0.0287 0.0295 2.46e0 2.55e0  1.00    1367.    2197.

4.3 brms

pois_fit_brm <- brm(y ~ 1,
  family = poisson(),
  data = pois_df,
  refresh = 0,
  backend = "cmdstanr")
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
summary(pois_fit_brm)
 Family: poisson 
  Links: mu = log 
Formula: y ~ 1 
   Data: pois_df (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     2.50      0.03     2.44     2.56 1.00     1650     1850

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

5 Linear model

The simplest multiple linear regression model is the following, with two predictors (\(x_1\) and \(x_2\)), two slopes (\(\beta_1\) and \(\beta_2\)) and intercept coefficient (\(\beta_0\)), and normally distributed noise (\(\sigma\)). This model can be written using standard regression notation as

\[ y_i \sim N(\mu_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}, \sigma) \]

or

\[ y_i \sim N(\mu_i = \boldsymbol{\beta \cdot x_i}, \sigma) \]

where \(\boldsymbol{\beta}\) is a 1 \(\times\) 3 coefficient matrix (or row vector) and \(\boldsymbol{x_i}\) is a 3 \(\times\) 1 predictor matrix (including the intercept). We use \(\boldsymbol{x}\) (3 \(\times\) N) to compute the \(\mu\) vector with length \(N\).

Q: What are the posterior means and 95% credible intervals of the coefficients (\(\beta_0\), \(\beta_1\), \(\beta_2\))?

set.seed(123)
n <- 100
beta <- c(2, 1.2, -0.8)
x1 <- rnorm(n, mean = 0, sd = 1)
x2 <- rnorm(n, mean = 0, sd = 1)
sigma <- 0.4
y <- rnorm(n, mean = beta[1] + beta[2] * x1 + beta[3] * x2, sd = sigma)
lm_list <- list(
  y = y,
  N = length(y),
  x = rbind(1, x1, x2) # 3 x N matrix
)

lm_df <- tibble(
  y = y,
  N = length(y),
  x1 = x1,
  x2 = x2
)

5.1 CmdStan

stan/lm.stan

 data {
  int<lower=0> N;
  vector[N] y;
  matrix[3, N] x;
}

parameters {
  row_vector[3] beta;
  real<lower=0> sigma;
}

model {
  y ~ normal(beta * x, sigma);
  beta ~ normal(0, 2.5);
  sigma ~ cauchy(0, 2.5);
} 
lm_mod <- cmdstan_model("stan/lm.stan")

lm_fit <- lm_mod$sample(
  data = lm_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # don't print update
)
Running MCMC with 4 parallel chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.2 seconds.

Use posterior::summarise_draws to check more detailed summary statistics.

lm_summary <- posterior::summarise_draws(
  lm_fit$draws(),
  mean = ~mean(.x),
  sd = ~sd(.x),
  ~posterior::quantile2(.x, probs = c(0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975))
)

lm_summary
# A tibble: 5 × 10
  variable   mean     sd   q2.5     q5    q25    q50    q75    q95  q97.5
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 lp__     44.6   1.48   40.8   41.6   43.8   44.9   45.7   46.3   46.4  
2 beta[1]   2.05  0.0390  1.98   1.99   2.03   2.05   2.08   2.12   2.13 
3 beta[2]   1.15  0.0430  1.06   1.08   1.12   1.15   1.17   1.22   1.23 
4 beta[3]  -0.790 0.0400 -0.870 -0.856 -0.816 -0.789 -0.763 -0.726 -0.715
5 sigma     0.386 0.0292  0.334  0.342  0.365  0.384  0.405  0.437  0.448

5.2 brms

lm_fit_brm <- brm(
  y ~ x1 + x2,
  family = gaussian(),
  data = lm_df,
  prior = c(
    prior(normal(0, 2.5), class = "Intercept"),
    prior(normal(0, 2.5), class = "b"),
    prior(cauchy(0, 2.5), class = "sigma")
  ),
  refresh = 0, # don't print update
  backend = "cmdstanr"
)
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
summary(lm_fit_brm)
 Family: gaussian 
  Links: mu = identity 
Formula: y ~ x1 + x2 
   Data: lm_df (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     2.05      0.04     1.97     2.13 1.00     4347     3260
x1            1.15      0.04     1.06     1.23 1.00     3995     2774
x2           -0.79      0.04    -0.87    -0.71 1.00     4302     2884

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.39      0.03     0.34     0.45 1.00     4241     2851

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
lm_summary_brm <- posterior::summarise_draws(
  posterior::as_draws_df(lm_fit_brm),
  mean = ~mean(.x),
  sd = ~sd(.x),
  ~posterior::quantile2(.x, probs = c(0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975))
)

lm_summary_brm
# A tibble: 7 × 10
  variable       mean     sd    q2.5      q5     q25     q50     q75     q95
  <chr>         <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 b_Intercept   2.05  0.0394   1.97    1.99    2.03    2.05    2.08    2.12 
2 b_x1          1.15  0.0427   1.06    1.08    1.12    1.15    1.18    1.22 
3 b_x2         -0.790 0.0404  -0.868  -0.855  -0.816  -0.790  -0.762  -0.723
4 sigma         0.386 0.0286   0.336   0.342   0.366   0.385   0.403   0.435
5 Intercept     2.24  0.0391   2.17    2.18    2.22    2.24    2.27    2.31 
6 lprior       -7.46  0.0172  -7.49   -7.48   -7.47   -7.45   -7.44   -7.43 
7 lp__        -54.3   1.51   -58.1   -57.2   -55.0   -53.9   -53.1   -52.5  
# ℹ 1 more variable: q97.5 <dbl>

6 Varying intercepts

There is a power-law relationship (\(y =\beta_0x^{\beta_1}\)) between tree diameter (DBH) and tree maximum height, and the scaling factor \(\beta_0\) varies among species.

\[ \text{log}\; y_{ij} \sim N(\text{log}\; \beta_{0j} + \beta_1 x_{i}, \sigma) \]

\[ \text{log}\; \beta_{0j} \sim N(\mu_0, \tau) \]

\[ \mu_0 \sim N(0, 2.5) \]

\[ \tau \sim \text{Half-Cauchy}(0, 1) \]

Q: What are the species-level scaling factors (\(\beta_{0j}\)) and the global scaling exponent (\(\beta_1\))?

6.1 Dummy data

  • We simulate a dataset with 10 species and 20 trees per species.
  • Each species has wood density (wd) values.
set.seed(12345)

n_sp <- 10
n_rep <- 20
wd <- rnorm(n_sp, 0, 1)
gamma0 <- 0.6
gamma1 <- 0.1
sigma_y <- 0.1

b1_hat <- gamma1 * wd + gamma0
b1 <- rnorm(n_sp, b1_hat, 0.01)
log_b0 <- rnorm(n_sp, 0.55, 0.05)

# ---- simulate ----
allo_df <- tibble(
  sp     = factor(rep(paste0("sp", 1:n_sp), each = n_rep)),
  wd  = rep(wd, each = n_rep),
  # now log_xx ~ Normal(mean log-dbh, sd = 0.5)
  log_xx = rnorm(n_sp * n_rep, mean = log(40), sd = 0.5)) |>
  mutate(
    # add observation-level noise on log-height
    log_y = rnorm(
      n(),
      log_b0[as.integer(sp)] + b1[as.integer(sp)] * log_xx,
      sigma_y),
    dbh = exp(log_xx),
    h = exp(log_y)) |>
  select(sp, wd, dbh, h)

dbh_hist <- allo_df |>
  ggplot(aes(dbh)) +
  geom_histogram() +
  xlab("DBH (cm)") +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 24)
    )

h_hist <- allo_df |>
  ggplot(aes(h)) +
  geom_histogram() +
  xlab("Height (m)") +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 24)
    )

p1 <- allo_df |>
  ggplot(aes(x = dbh, y = h, col = sp)) +
  geom_point() +
  xlab("DBH (cm)") +
  ylab(expression("Height (m)")) +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 24)
    )

p2 <- p1 +
  scale_x_log10() +
  scale_y_log10()


dbh_hist + h_hist + p1 + p2

allo_list <- list(
  log_h = log(allo_df$h),
  log_dbh = log(allo_df$dbh),
  sp = as.integer(allo_df$sp),
  N = nrow(allo_df),
  J = allo_df$sp |> unique() |> length()
)

6.2 CmdStan

stan/vint.stan

 data {
  int<lower=1> N;               // total number of observations
  int<lower=1> J;               // number of species
  array[N] int<lower=1, upper=J> sp; // sp index
  vector[N] log_dbh;            // predictor: log(DBH)
  vector[N] log_h;              // response: log(height)
}

parameters {
  real        mu0;         // grand mean of log(mu₀)
  real<lower=0> tau;            // SD of species intercepts
  vector[J]   log_beta0;            // species‐level intercepts = log(β₀ⱼ)
  real        beta1;            // common slope on log(DBH)
  real<lower=0> sigma;          // residual SD
}

model {
  // Likelihood
  log_h ~ normal(log_beta0[sp] + beta1 * log_dbh, sigma);

  // Priors
  mu0 ~ normal(0, 2.5);
  tau      ~ cauchy(0, 1);
  log_beta0  ~ normal(mu0, tau);
  beta1    ~ normal(0, 2.5);
  sigma    ~ cauchy(0, 1);
}

// we need this for model selection
generated quantities {
  vector[N] log_lik;
  for (i in 1:N)
    log_lik[i] = normal_lpdf(log_h[i]
                    | log_beta0[sp[i]] + beta1 * log_dbh[i],
                      sigma);
} 
vint_mod <- cmdstan_model("stan/vint.stan")

vint_fit <- vint_mod$sample(
  data = allo_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # don't print update
)
Running MCMC with 4 parallel chains...
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/tmp/RtmpS3fhTi/model-497d55c0ea515.stan', line 19, column 2 to column 57)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/tmp/RtmpS3fhTi/model-497d55c0ea515.stan', line 19, column 2 to column 57)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/tmp/RtmpS3fhTi/model-497d55c0ea515.stan', line 19, column 2 to column 57)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 3 
Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 4 finished in 0.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.3 seconds.
vint_fit$summary()
# A tibble: 215 × 10
   variable         mean   median     sd    mad      q5      q95  rhat ess_bulk
   <chr>           <dbl>    <dbl>  <dbl>  <dbl>   <dbl>    <dbl> <dbl>    <dbl>
 1 lp__         365.     366.     2.69   2.59   360.    369.      1.00    1185.
 2 mu0            0.483    0.484  0.119  0.114    0.285   0.679   1.00    1081.
 3 tau            0.332    0.314  0.0912 0.0781   0.221   0.503   1.00    1508.
 4 log_beta0[1]   0.782    0.782  0.0565 0.0569   0.689   0.875   1.01     416.
 5 log_beta0[2]   0.921    0.923  0.0544 0.0542   0.832   1.01    1.01     419.
 6 log_beta0[3]   0.440    0.442  0.0555 0.0564   0.349   0.531   1.01     413.
 7 log_beta0[4]   0.265    0.266  0.0580 0.0597   0.171   0.359   1.01     393.
 8 log_beta0[5]   0.620    0.621  0.0538 0.0535   0.531   0.708   1.01     420.
 9 log_beta0[6]  -0.0154  -0.0146 0.0545 0.0550  -0.105   0.0733  1.01     432.
10 log_beta0[7]   0.710    0.712  0.0545 0.0546   0.621   0.798   1.01     420.
# ℹ 205 more rows
# ℹ 1 more variable: ess_tail <dbl>
vint_fit$diagnostic_summary()
$num_divergent
[1] 0 0 0 0

$num_max_treedepth
[1] 0 0 0 0

$ebfmi
[1] 1.0896007 0.9152977 1.0290832 1.0009125

6.3 brms

priors <- c(
  prior(normal(0, 2.5), class = Intercept),
  prior(normal(0, 2.5),   class = b),
  prior(cauchy(0, 1),   class = sd),
  prior(cauchy(0, 1),   class = sigma)
)

vint_fit_brm <- brm(
  log(h) ~ log(dbh) + (1 | sp),
  family = gaussian(),
  data = allo_df,
  prior = priors,
  seed = 123,
  refresh = 0, # don't print update
  backend = "cmdstanr"
)
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.5 seconds.
Chain 2 finished in 0.5 seconds.
Chain 3 finished in 0.5 seconds.
Chain 4 finished in 0.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.5 seconds.
Total execution time: 2.3 seconds.
summary(vint_fit_brm)
 Family: gaussian 
  Links: mu = identity 
Formula: log(h) ~ log(dbh) + (1 | sp) 
   Data: allo_df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~sp (Number of levels: 10) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.33      0.09     0.21     0.55 1.01      714     1210

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.47      0.12     0.23     0.70 1.00      813     1101
logdbh        0.61      0.01     0.59     0.64 1.00     2035     1824

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.10      0.01     0.09     0.11 1.00     2004     2100

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(vint_fit_brm)
Loading required namespace: rstan

6.4 lme4

vint_fit_lme <- lme4::lmer(
  log(h) ~ log(dbh) + (1 | sp),
  data = allo_df)

summary(vint_fit_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: log(h) ~ log(dbh) + (1 | sp)
   Data: allo_df

REML criterion at convergence: -298.3

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.66966 -0.62316  0.04254  0.60122  2.28315 

Random effects:
 Groups   Name        Variance Std.Dev.
 sp       (Intercept) 0.084434 0.29057 
 Residual             0.009794 0.09897 
Number of obs: 200, groups:  sp, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.47916    0.10454   4.584
log(dbh)     0.61154    0.01314  46.525

Correlation of Fixed Effects:
         (Intr)
log(dbh) -0.472

6.5 Visualization and predictions

6.5.1 CmdStan

# Predicted DBH-height curves per species with 95% intervals
pred_grid <- allo_df |>
  dplyr::group_by(sp) |>
  dplyr::summarise(
    dbh = list(seq(min(dbh), max(dbh), length.out = 60)),
    .groups = "drop"
  ) |>
  tidyr::unnest(dbh)

cmd_draws <- posterior::as_draws_matrix(vint_fit$draws(c("beta1", "log_beta0")))
beta1_draws <- cmd_draws[, "beta1"]
log_beta0_cols <- grep("^log_beta0\\[", colnames(cmd_draws), value = TRUE)
species_levels <- levels(allo_df$sp)

DBH for predicted lines for each species.

pred_grid
# A tibble: 600 × 2
   sp      dbh
   <fct> <dbl>
 1 sp1    12.2
 2 sp1    14.0
 3 sp1    15.8
 4 sp1    17.6
 5 sp1    19.5
 6 sp1    21.3
 7 sp1    23.1
 8 sp1    25.0
 9 sp1    26.8
10 sp1    28.6
# ℹ 590 more rows

Posterior draws of the slope parameter (\(\beta_1\)).

str(beta1_draws)
 'draws_matrix' num [1:4000, 1] 0.638 0.635 0.62 0.627 0.629 ...
 - attr(*, "dimnames")=List of 2
  ..$ draw    : chr [1:4000] "1" "2" "3" "4" ...
  ..$ variable: chr "beta1"
 - attr(*, "nchains")= int 4

Column names for log-transformed species-level intercepts (\(\log \beta_{0j}\)). We use this to extract posterior draws for each species.

str(log_beta0_cols)
 chr [1:10] "log_beta0[1]" "log_beta0[2]" "log_beta0[3]" "log_beta0[4]" ...
pred_cmd_summary <- dplyr::bind_cols(
  pred_grid,
  purrr::map2_dfr(
    match(pred_grid$sp, species_levels),
    log(pred_grid$dbh),
    ~{
      log_mu <- cmd_draws[, log_beta0_cols[.x]] + beta1_draws * .y
      height_draws <- exp(log_mu)
      tibble::tibble(
        estimate = stats::median(height_draws),
        lower = stats::quantile(height_draws, 0.025),
        upper = stats::quantile(height_draws, 0.975)
      )
    }
  )
)

Compute predicted height summary for each species and DBH.

pred_cmd_summary
# A tibble: 600 × 5
   sp      dbh estimate lower upper
   <fct> <dbl>    <dbl> <dbl> <dbl>
 1 sp1    12.2     10.1  9.53  10.6
 2 sp1    14.0     11.0 10.4   11.6
 3 sp1    15.8     11.8 11.2   12.4
 4 sp1    17.6     12.6 12.0   13.3
 5 sp1    19.5     13.4 12.8   14.1
 6 sp1    21.3     14.2 13.5   14.9
 7 sp1    23.1     14.9 14.2   15.6
 8 sp1    25.0     15.6 14.9   16.3
 9 sp1    26.8     16.3 15.6   17.1
10 sp1    28.6     17.0 16.2   17.8
# ℹ 590 more rows
ggplot(pred_cmd_summary, aes(dbh, estimate, colour = sp)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = sp), alpha = 0.15, colour = NA) +
  geom_line() +
  geom_point(
    data = allo_df,
    aes(dbh, h, colour = sp),
    size = 1,
    alpha = 0.6,
    inherit.aes = FALSE
  ) +
  scale_x_log10() +
  scale_y_log10() +
  guides(fill = "none") +
  labs(
    x = "DBH (cm)",
    y = "Height (m)",
    colour = "Species"
  )

6.5.2 brms

With fitted() available, preparing pred_summary is simpler than in the CmdStan workflow.

pred_summary <- fitted(vint_fit_brm, newdata = pred_grid, re_formula = NULL) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(pred_grid) |>
  dplyr::mutate(
    estimate = exp(Estimate),
    lower = exp(Q2.5),
    upper = exp(Q97.5)
  )

ggplot(pred_summary, aes(dbh, estimate, color = sp)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = sp), alpha = 0.15, colour = NA) +
  geom_line() +
  geom_point(
    data = allo_df,
    aes(dbh, h, colour = sp),
    size = 1,
    alpha = 0.6,
    inherit.aes = FALSE
  ) +
  scale_x_log10() +
  scale_y_log10() +
  guides(fill = "none") +
  labs(
    x = "DBH (cm)",
    y = "Height (m)",
    colour = "Species"
  )

7 Reparameterization for multilevel models

Diagnosing Biased Inference with Divergences by Michael Betancourt

7.1 Eight schools example

What is the effect of the treatment (a new education method) on students’ scores for each school and across schools?

school y sigma
School_1 28 15
School_2 8 10
School_3 -3 16
School_4 7 11
School_5 -1 9
School_6 1 11
School_7 18 10
School_8 12 18

7.2 Centered parameterization

\[ y_j \sim \mathcal{N}\bigl(\theta_j,\;\sigma_j\bigr) \]

\[ \theta_j \sim \mathcal{N}\bigl(\mu,\;\tau \bigr) \]

\[ \mu \sim \mathcal{N}\bigl(0,\;5 \bigr) \]

\[ \tau \sim \text{Half-Cauchy}(0, 2.5) \]

Note: \(\sigma_j\) is known constant (i.e., data), we don’t have to estimate it.

This parameterization is intuitive, but it often leads to convergence issues in hierarchical models.

7.3 Non-centered parameterization

\[ \tilde{\theta_j} \sim \mathcal{N}\bigl(0,\;1 \bigr) \]

\[ \theta_j = \mu + \tau \cdot \tilde{\theta_j} \]

In this parameterization, we introduce a new latent variable \(\tilde{\theta_j}\) that is independent of the other parameters. This allows the sampler to explore the posterior more efficiently and avoids problems with convergence. We often use this parameterization when we have hierarchical models with varying intercepts or slopes.

8 Varying slopes and intercepts

There is a power-law relationship (\(y =\beta_0 x^{\beta_1}\)) between tree diameter (DBH) and tree maximum height, and both the scaling factor \(\beta_0\) and the exponent \(\beta_1\) vary among species.

\[ \text{log}\; y_{ij} \sim N(\text{log}\; \beta_{0j} + \beta_{1j} x_{ij}, \sigma) \]

\[ \begin{bmatrix} \text{log}\; \beta_{0j} \\ \beta_{1j} \end{bmatrix} \sim \mathrm{MVN} \Biggl( \boldsymbol{\mu} = \begin{bmatrix} \mu_{0} \\ \mu_{1} \end{bmatrix}, \boldsymbol{\Sigma} = \begin{bmatrix} \sigma_{0}^2 & \rho \sigma_{0} \sigma_{1} \\ \rho \sigma_{0} \sigma_{1} & \sigma_{1}^2 \end{bmatrix} \Biggr) \]

Q: What are the species-level scaling factors (\(\beta_{0j}\)) and scaling exponent (\(\beta_{0j}\))? What are the global scaling exponent (\(\mu_0\)) and scaling exponent (\(\mu_1\))?

8.1 Centered parameterization

\[ \boldsymbol{\mu} \sim N(0, 2.5) \]

\[ \boldsymbol{\Sigma} \sim \mathcal{W}^{-1}_p(\nu,\,\Psi) \]

This parameterization using inverse-whishart distribution \(\mathcal{W}^{-1}\) is not recommended for hierarchical models with varying slopes and intercepts.

8.2 Non-Centered parameterization

Optimization through Cholesky factorization

\[ \boldsymbol{z}_j \sim \mathcal{N}(\mathbf0,\,I_2) \] \[ \begin{pmatrix}\log\beta_{0j}\\[3pt]\beta_{1j}\end{pmatrix} = \boldsymbol{\mu} + L\, \boldsymbol{z}_j \]

Some linear algebra (Cholesky decomposition):

\[ \Sigma \;=\;L\,L^\top \;=\;\bigl[\mathrm{diag}(\tau)\,L_\Omega\bigr]\, \bigl[L_\Omega^\top\,\mathrm{diag}(\tau)\bigr] \;=\;\mathrm{diag}(\tau)\,\Omega\,\mathrm{diag}(\tau). \]

\[ L_\Omega \sim \mathrm{LKJ\_Corr\_Cholesky}(2) \quad \text{or} \quad \Omega \sim \mathrm{LKJ\_Corr}(2) \] \[ \tau_i \sim \mathrm{Half\text{-}Cauchy}(0,1) \]

Rather than sampling the correlation matrix \(\Omega\) directly, sampling the Cholesky factor \(L_\Omega\) is computationally more efficient and numerically stable.

In the LKJ() prior:

  • \(\eta = 1\) indicates a flat prior (uniform distribution) over correlation matrices.

  • \(\eta > 1\) indicates weaker correlations.

We usually use \(\eta = 2\) because we assume at least weak correlations among the parameters while still allowing moderate to strong correlations.

A simple simulation from a bivariate normal prior highlights how correlation induces joint structure in \((\log \beta_{0j}, \beta_{1j})\).

set.seed(123)
Sigma_demo <- matrix(c(0.2, 0.15, 0.15, 0.3), nrow = 2)
mu_demo <- c(-0.2, 0.6)
mvn_draws <- MASS::mvrnorm(n = 400, mu = mu_demo, Sigma = Sigma_demo) |>
  tibble::as_tibble(.name_repair = ~c("log_beta0", "beta1"))

ggplot(mvn_draws, aes(x = log_beta0, y = beta1)) +
  geom_point(alpha = 0.4, colour = "#1f78b4") +
  labs(
    x = expression(log(beta[0])),
    y = expression(beta[1]),
    title = "Simulated correlated effects from a bivariate normal prior"
  ) +
  theme_minimal(base_size = 14)

8.3 CmdStan

stan/vslope.stan

 data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  array[N] int<lower=1, upper=J> jj;  // group for individual
  int<lower=1> L;              // number of group-level predictors
  matrix[N, K] x;              // individual predictors
  matrix[L, J] u;              // group predictors
  vector[N] y;                 // outcomes
}

parameters {
  matrix[K, L] gamma;          // coefficients across groups
  real<lower=0> sigma;         // prediction error scale
  vector<lower=0>[K] tau;      // prior scale
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
}

transformed parameters {
  matrix[K, J] beta;
  beta = gamma * u + diag_pre_multiply(tau, L_Omega) * z;
}

model {
  to_vector(z) ~ std_normal();
  // tau ~ cauchy(0, 2.5);
  tau ~ normal(0, 2.5);
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 2.5);
  for (n in 1:N) {
    y[n] ~ normal(x[n, ] * beta[, jj[n]], sigma);
  }
} 
allo_vslope_list <- list(
  y = log(allo_df$h),
  x = cbind(1, log(allo_df$dbh)),
  jj = as.integer(allo_df$sp),
  N = nrow(allo_df),
  K = 2, # number of predictors (intercept + slope)
  J = allo_df$sp |> unique() |> length(),
  L = 1 # number of group-level predictors (intercept)
)
allo_vslope_list$u <- matrix(1, nrow = allo_vslope_list$L, ncol = allo_vslope_list$J)
vslope_mod <- cmdstan_model("stan/vslope.stan")

vslope_fit <- vslope_mod$sample(
  data = allo_vslope_list,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  adapt_delta = 0.95, # increase adapt_delta to avoid divergent transitions
  max_treedepth = 15, # increase max_treedepth to avoid max treedepth errors
  refresh = 0, # don't print update
  show_messages = FALSE   # suppress Stan’s “Informational Message” output
)
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 3 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpyElzv5/model-484a4544c91b3.stan', line 29, column 2 to column 33)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
vslope_fit$summary()
# A tibble: 50 × 10
   variable       mean   median      sd     mad        q5     q95  rhat ess_bulk
   <chr>         <dbl>    <dbl>   <dbl>   <dbl>     <dbl>   <dbl> <dbl>    <dbl>
 1 lp__       360.     360.     4.50    4.32    352.      366.     1.00    1126.
 2 gamma[1,1]   0.502    0.504  0.0547  0.0530    0.413     0.590  1.00    3475.
 3 gamma[2,1]   0.606    0.606  0.0310  0.0284    0.556     0.658  1.00    1853.
 4 sigma        0.0924   0.0923 0.00469 0.00462   0.0849    0.100  1.00    4905.
 5 tau[1]       0.0696   0.0586 0.0553  0.0502    0.00553   0.171  1.00    1623.
 6 tau[2]       0.0884   0.0841 0.0280  0.0233    0.0527    0.140  1.00    1491.
 7 z[1,1]       0.580    0.636  0.900   0.849    -1.01      1.98   1.00    1960.
 8 z[2,1]       0.658    0.711  0.628   0.555    -0.461     1.59   1.00    2010.
 9 z[1,2]       0.161    0.187  0.994   1.02     -1.52      1.72   1.00    1537.
10 z[2,2]       1.29     1.31   0.667   0.612     0.173     2.35   1.00    1787.
# ℹ 40 more rows
# ℹ 1 more variable: ess_tail <dbl>
vslope_fit$diagnostic_summary()
$num_divergent
[1] 0 0 0 0

$num_max_treedepth
[1] 0 0 0 0

$ebfmi
[1] 0.8183985 0.7952493 0.8137258 0.8179199

8.4 brms

priors <- c(
  prior(normal(0, 2.5), class = Intercept),
  prior(normal(0, 2.5),   class = b),
  prior(lkj(2), class = cor),                # Ω ~ LKJ(2)
  prior(cauchy(0, 1),   class = sigma)
)

tic()
slope_fit_brm <- brm(
  log(h) ~ log(dbh) + (1 + log(dbh) | sp),
  family = gaussian(),
  data = allo_df,
  prior = priors,
  refresh = 0, # don't print update
  control = list(
    adapt_delta  = 0.95,  # default is 0.8
    max_treedepth = 15    # default is 10
  ),
  backend = "cmdstanr"
)
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 finished in 3.9 seconds.
Chain 2 finished in 3.7 seconds.
Chain 3 finished in 3.4 seconds.
Chain 4 finished in 3.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 3.7 seconds.
Total execution time: 15.0 seconds.
Warning: 3 of 4000 (0.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
toc()
27.193 sec elapsed

Divergent transitions! We need to increase adapt_delta and max_treedepth to avoid divergent transitions. We may also need to change the priors. Reparameterization is less straightforward in brms than writing your own Stan code.

summary(slope_fit_brm)
Warning: There were 3 divergent transitions after warmup. Increasing
adapt_delta above 0.95 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
 Family: gaussian 
  Links: mu = identity 
Formula: log(h) ~ log(dbh) + (1 + log(dbh) | sp) 
   Data: allo_df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~sp (Number of levels: 10) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)             0.07      0.06     0.00     0.20 1.00     2448
sd(logdbh)                0.09      0.03     0.05     0.15 1.00     1512
cor(Intercept,logdbh)     0.13      0.43    -0.70     0.85 1.01      802
                      Tail_ESS
sd(Intercept)             2233
sd(logdbh)                2077
cor(Intercept,logdbh)     1210

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.50      0.05     0.39     0.61 1.00     3694     3266
logdbh        0.60      0.03     0.54     0.67 1.00     1467     1680

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.09      0.00     0.08     0.10 1.00     4853     3209

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

8.5 lme4

slope_fit_lme <- lme4::lmer(
  log(h) ~ log(dbh) + (1 + log(dbh) | sp),
  data = allo_df)
boundary (singular) fit: see help('isSingular')

Not convergent!

summary(slope_fit_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: log(h) ~ log(dbh) + (1 + log(dbh) | sp)
   Data: allo_df

REML criterion at convergence: -327.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.14307 -0.63987  0.08714  0.61696  3.03735 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 sp       (Intercept) 0.001326 0.03641      
          log(dbh)    0.004607 0.06787  1.00
 Residual             0.008397 0.09164      
Number of obs: 200, groups:  sp, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.49980    0.04759   10.50
log(dbh)     0.60637    0.02468   24.57

Correlation of Fixed Effects:
         (Intr)
log(dbh) -0.264
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

9 Varying slopes and intercepts, and group-level predictors

There is a power-law relationship (\(y =\beta_0 x^{\beta_1}\)) between tree diameter (DBH) and tree maximum height, and both the scaling factor \(\beta_0\) and the exponent \(\beta_1\) vary among species. Those species-level variations can be explained by wood density (group-level predictor: \(u\)) of each species.

\[ \text{log}\; y_{ij} \sim N(\text{log}\; \beta_{0j} + \beta_{1j} x_{ij}, \sigma) \]

\[ \begin{bmatrix} \text{log}\; \beta_{0j} \\ \beta_{1j} \end{bmatrix} \sim \mathrm{MVN} \Biggl( \underbrace{ \begin{bmatrix} \gamma_{0}^{(\beta_0)} + \gamma_{1}^{(\beta_0)}u_{j} \\ \gamma_{0}^{(\beta_1)} + \gamma_{1}^{(\beta_1)}u_{j} \end{bmatrix}}_{\text{mean depends on }u_j}, \begin{bmatrix} \sigma_{0}^2 & \rho \sigma_{0} \sigma_{1} \\ \rho \sigma_{0} \sigma_{1} & \sigma_{1}^2 \end{bmatrix} \Biggr) \]

The remainder of this section follows the setup from the varying slopes and intercepts section.

allo_vslope_list$L <- 2 # number of group-level predictors (intercept and wd)
allo_vslope_list$u <- matrix(1, nrow = allo_vslope_list$L, ncol = allo_vslope_list$J)
vgrp_fit <- vslope_mod$sample(
  data = allo_vslope_list,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  adapt_delta = 0.95, # increase adapt_delta to avoid divergent transitions
  max_treedepth = 15, # increase max_treedepth to avoid max treedepth errors
  refresh = 0 # don't print update
)

vgrp_fit$summary()
vgrp_fit$diagnostic_summary()

9.1 brms

priors <- c(
  prior(normal(0, 2.5), class = Intercept),
  prior(normal(0, 2.5),   class = b),
  prior(lkj(2), class = cor),                # Ω ~ LKJ(2)
  prior(cauchy(0, 1),   class = sigma)
)
vgrp_fit_brm <- brm(
  log(h) ~ log(dbh) * wd + (1 + log(dbh) | sp),
  family = gaussian(),
  data = allo_df,
  prior = priors,
  refresh = 0, # don't print update
  control = list(
    adapt_delta  = 0.95,  # default is 0.8
    max_treedepth = 15    # default is 10
  ),
  backend = "cmdstanr"
)
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 finished in 4.5 seconds.
Chain 2 finished in 5.0 seconds.
Chain 3 finished in 4.8 seconds.
Chain 4 finished in 5.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 4.9 seconds.
Total execution time: 19.7 seconds.
Warning: 1 of 4000 (0.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
summary(vgrp_fit_brm)
Warning: There were 1 divergent transitions after warmup. Increasing
adapt_delta above 0.95 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
 Family: gaussian 
  Links: mu = identity 
Formula: log(h) ~ log(dbh) * wd + (1 + log(dbh) | sp) 
   Data: allo_df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~sp (Number of levels: 10) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)             0.09      0.07     0.00     0.25 1.00     1526
sd(logdbh)                0.08      0.03     0.04     0.15 1.00     1635
cor(Intercept,logdbh)     0.14      0.42    -0.69     0.86 1.00      919
                      Tail_ESS
sd(Intercept)             1437
sd(logdbh)                2189
cor(Intercept,logdbh)     1307

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.50      0.06     0.38     0.62 1.00     4203     3131
logdbh        0.60      0.03     0.54     0.66 1.00     1890     2121
wd            0.02      0.07    -0.12     0.17 1.00     4449     2773
logdbh:wd    -0.05      0.04    -0.13     0.03 1.00     2368     2327

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.09      0.00     0.08     0.10 1.00     6320     2847

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

9.2 lme4

grp_fit_lme <- lme4::lmer(
  log(h) ~ log(dbh) * wd + (1 + log(dbh) | sp),
  data = allo_df)
boundary (singular) fit: see help('isSingular')

10 References

devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       Ubuntu 24.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Etc/UTC
 date     2025-10-31
 pandoc   3.8.1 @ /usr/bin/ (via rmarkdown)
 quarto   1.7.32 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package        * version  date (UTC) lib source
 P abind            1.4-8    2024-09-12 [?] RSPM (R 4.5.0)
 P backports        1.5.0    2024-05-23 [?] RSPM (R 4.5.0)
 P bayesplot      * 1.14.0   2025-08-31 [?] RSPM (R 4.5.0)
 P boot             1.3-32   2025-08-29 [?] RSPM (R 4.5.0)
 P bridgesampling   1.1-2    2021-04-16 [?] RSPM (R 4.5.0)
 P brms           * 2.23.0   2025-09-09 [?] RSPM (R 4.5.0)
 P Brobdingnag      1.2-9    2022-10-19 [?] RSPM (R 4.5.0)
 P cachem           1.1.0    2024-05-16 [?] RSPM (R 4.5.0)
 P checkmate        2.3.3    2025-08-18 [?] RSPM (R 4.5.0)
 P cli              3.6.5    2025-04-23 [?] RSPM (R 4.5.0)
 P cmdstanr       * 0.9.0    2025-03-30 [?] repository (https://github.com/stan-dev/cmdstanr@da99e2b)
 P coda             0.19-4.1 2024-01-31 [?] RSPM (R 4.5.0)
   codetools        0.2-20   2024-03-31 [3] CRAN (R 4.5.1)
 P data.table       1.17.8   2025-07-10 [?] RSPM (R 4.5.0)
   devtools         2.4.6    2025-10-03 [2] RSPM (R 4.5.0)
 P digest           0.6.37   2024-08-19 [?] RSPM (R 4.5.0)
 P distributional   0.5.0    2024-09-17 [?] RSPM (R 4.5.0)
 P dplyr          * 1.1.4    2023-11-17 [?] RSPM (R 4.5.0)
   ellipsis         0.3.2    2021-04-29 [2] RSPM (R 4.5.0)
 P evaluate         1.0.5    2025-08-27 [?] RSPM (R 4.5.0)
 P farver           2.1.2    2024-05-13 [?] RSPM (R 4.5.0)
 P fastmap          1.2.0    2024-05-15 [?] RSPM (R 4.5.0)
 P forcats        * 1.0.1    2025-09-25 [?] RSPM (R 4.5.0)
 P fs               1.6.6    2025-04-12 [?] RSPM (R 4.5.0)
 P generics         0.1.4    2025-05-09 [?] RSPM (R 4.5.0)
 P ggplot2        * 4.0.0    2025-09-11 [?] RSPM (R 4.5.0)
 P glue             1.8.0    2024-09-30 [?] RSPM (R 4.5.0)
 P gridExtra        2.3      2017-09-09 [?] RSPM (R 4.5.0)
 P gtable           0.3.6    2024-10-25 [?] RSPM (R 4.5.0)
 P hms              1.1.4    2025-10-17 [?] RSPM (R 4.5.0)
 P htmltools        0.5.8.1  2024-04-04 [?] RSPM (R 4.5.0)
   htmlwidgets      1.6.4    2023-12-06 [2] RSPM (R 4.5.0)
 P inline           0.3.21   2025-01-09 [?] RSPM (R 4.5.0)
 P jsonlite         2.0.0    2025-03-27 [?] RSPM (R 4.5.0)
 P kableExtra       1.4.0    2024-01-24 [?] RSPM (R 4.5.0)
 P knitr          * 1.50     2025-03-16 [?] RSPM (R 4.5.0)
 P labeling         0.4.3    2023-08-29 [?] RSPM (R 4.5.0)
   lattice          0.22-7   2025-04-02 [3] CRAN (R 4.5.1)
 P lifecycle        1.0.4    2023-11-07 [?] RSPM (R 4.5.0)
 P lme4             1.1-37   2025-03-26 [?] RSPM (R 4.5.0)
 P loo              2.8.0    2024-07-03 [?] RSPM (R 4.5.0)
 P lubridate      * 1.9.4    2024-12-08 [?] RSPM (R 4.5.0)
 P magrittr         2.0.4    2025-09-12 [?] RSPM (R 4.5.0)
   MASS             7.3-65   2025-02-28 [3] CRAN (R 4.5.1)
 P Matrix           1.7-4    2025-08-28 [?] RSPM (R 4.5.0)
 P matrixStats      1.5.0    2025-01-07 [?] RSPM (R 4.5.0)
 P memoise          2.0.1    2021-11-26 [?] RSPM (R 4.5.0)
 P minqa            1.2.8    2024-08-17 [?] RSPM (R 4.5.0)
 P mvtnorm          1.3-3    2025-01-10 [?] RSPM (R 4.5.0)
   nlme             3.1-168  2025-03-31 [3] CRAN (R 4.5.1)
 P nloptr           2.2.1    2025-03-17 [?] RSPM (R 4.5.0)
 P patchwork      * 1.3.2    2025-08-25 [?] RSPM (R 4.5.0)
 P pillar           1.11.1   2025-09-17 [?] RSPM (R 4.5.0)
 P pkgbuild         1.4.8    2025-05-26 [?] RSPM (R 4.5.0)
 P pkgconfig        2.0.3    2019-09-22 [?] RSPM (R 4.5.0)
   pkgload          1.4.1    2025-09-23 [2] RSPM (R 4.5.0)
 P plyr             1.8.9    2023-10-02 [?] RSPM (R 4.5.0)
 P posterior        1.6.1    2025-02-27 [?] RSPM (R 4.5.0)
 P processx         3.8.6    2025-02-21 [?] RSPM (R 4.5.0)
 P ps               1.9.1    2025-04-12 [?] RSPM (R 4.5.0)
 P purrr          * 1.1.0    2025-07-10 [?] RSPM (R 4.5.0)
 P QuickJSR         1.8.1    2025-09-20 [?] RSPM (R 4.5.0)
 P R6               2.6.1    2025-02-15 [?] RSPM (R 4.5.0)
 P rbibutils        2.3      2024-10-04 [?] RSPM (R 4.5.0)
 P RColorBrewer     1.1-3    2022-04-03 [?] RSPM (R 4.5.0)
 P Rcpp           * 1.1.0    2025-07-02 [?] RSPM (R 4.5.0)
 P RcppParallel     5.1.11-1 2025-08-27 [?] RSPM (R 4.5.0)
 P Rdpack           2.6.4    2025-04-09 [?] RSPM (R 4.5.0)
 P readr          * 2.1.5    2024-01-10 [?] RSPM (R 4.5.0)
 P reformulas       0.4.1    2025-04-30 [?] RSPM (R 4.5.0)
   remotes          2.5.0    2024-03-17 [2] RSPM (R 4.5.0)
   renv             1.1.5    2025-07-24 [1] RSPM (R 4.5.0)
 P reshape2         1.4.4    2020-04-09 [?] RSPM (R 4.5.0)
 P rlang            1.1.6    2025-04-11 [?] RSPM (R 4.5.0)
 P rmarkdown        2.30     2025-09-28 [?] RSPM (R 4.5.0)
 P rstan            2.32.7   2025-03-10 [?] RSPM (R 4.5.0)
 P rstantools       2.5.0    2025-09-01 [?] RSPM (R 4.5.0)
 P rstudioapi       0.17.1   2024-10-22 [?] RSPM (R 4.5.0)
 P S7               0.2.0    2024-11-07 [?] RSPM (R 4.5.0)
 P scales           1.4.0    2025-04-24 [?] RSPM (R 4.5.0)
   sessioninfo      1.2.3    2025-02-05 [2] RSPM (R 4.5.0)
 P StanHeaders      2.32.10  2024-07-15 [?] RSPM (R 4.5.0)
 P stringi          1.8.7    2025-03-27 [?] RSPM (R 4.5.0)
 P stringr        * 1.5.2    2025-09-08 [?] RSPM (R 4.5.0)
 P svglite          2.2.1    2025-05-12 [?] RSPM (R 4.5.0)
 P systemfonts      1.3.1    2025-10-01 [?] RSPM (R 4.5.0)
 P tensorA          0.36.2.1 2023-12-13 [?] RSPM (R 4.5.0)
 P textshaping      1.0.4    2025-10-10 [?] RSPM (R 4.5.0)
 P tibble         * 3.3.0    2025-06-08 [?] RSPM (R 4.5.0)
 P tictoc         * 1.2.1    2024-03-18 [?] RSPM (R 4.5.0)
 P tidyr          * 1.3.1    2024-01-24 [?] RSPM (R 4.5.0)
 P tidyselect       1.2.1    2024-03-11 [?] RSPM (R 4.5.0)
 P tidyverse      * 2.0.0    2023-02-22 [?] RSPM (R 4.5.0)
 P timechange       0.3.0    2024-01-18 [?] RSPM (R 4.5.0)
 P tzdb             0.5.0    2025-03-15 [?] RSPM (R 4.5.0)
   usethis          3.2.1    2025-09-06 [2] RSPM (R 4.5.0)
 P utf8             1.2.6    2025-06-08 [?] RSPM (R 4.5.0)
 P vctrs            0.6.5    2023-12-01 [?] RSPM (R 4.5.0)
 P viridisLite      0.4.2    2023-05-02 [?] RSPM (R 4.5.0)
 P withr            3.0.2    2024-10-28 [?] RSPM (R 4.5.0)
 P xfun             0.53     2025-08-19 [?] RSPM (R 4.5.0)
 P xml2             1.4.1    2025-10-27 [?] RSPM (R 4.5.0)
 P yaml             2.3.10   2024-07-26 [?] RSPM (R 4.5.0)

 [1] /workspaces/afec-bayes-2025/renv/library/linux-ubuntu-noble/R-4.5/x86_64-pc-linux-gnu
 [2] /usr/local/lib/R/site-library
 [3] /usr/local/lib/R/library

 * ── Packages attached to the search path.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────